Welcome to the questionnaire design guide!
An aim of this course is to develop your ability to translate business problems into actionable research questions and to design an adequate research plan to answer these questions. Therefore, you need to be equiped with knowledge on how to create a survey and properly conduct a research.
Generally, what you can expect from the survey design is similar to what one experiences in a relationship. If you try to take more than you commit, it doesnāt work out. Now on a serious note, if you follow guidelines mentioned here, you will certainly avoid usual traps your fellow collegues were caught in.
In a research process, conducting a survey is a part of (primary) data collection. Before we collect data, we have to make sure that preceding steps are correctly done. However, in the following sections we will focus on the process of designing a questionnaire. Eventually, you will be able to collect relevant data and apply appropriate statistical tests.
A structured questionnaire is a research instrument designed to elicit specific information from a sample of a target population. Usually it is used in a standardized way with fixed-alternative questions (same questions and response options for all respondents).
An objective of a questionnaire is threefold:
In order to meet these objectives, a questionnaire design process suggests the following sequence of steps:
The questionnaire design should be aligned with the research design! In order to do make it aligned, it is necessary to review components of the problem and the approach. In particular, you should review the research questions, hypotheses and characteristics that influence the research design.
If you are interested in the causal effect of one particular (independent) variable on another (dependent) variable, think about an experimental design that might allow you to manipulate this variable. In this case, you particularly have to decide on the following:
What you need to be careful about is the effect of reversed causation. The effect refers to the situation where the causal relationship could possible have an opposite direction from what we assumed at the first place. For instance, it is often assumed that an increase in individual income leads to increase in well-being (happiness). However, some researches suggest that this causation could have an opposite direction, i.e.Ā that actually increase in well-being of an individual leads to an increase in income.
Here are some examples of causal research design applications:
If you would like to analyze the effects of multiple categorical or continuous (independent) variables on one continuous (dependent) variable, you might use a regression model. When doing this, you particularly have to decide on:
How to measure the dependent variable (DV). This is particularly important, since you need a variable that is powerful in uncovering variation between subjects (e.g., open-ended questions, such as āHow much are you willing to pay for this productā are good candidates). Moreover, you also need to consider the nature of your DV,i.e.Ā whether it is an interval variable, ordinal or categorical variable. The nature of your DV will heavily influence your choice of a correct statistical test.
How to measure the independent variables (IV) (single-item vs.Ā multi-item scales, categorical vs.Ā continuous). Bear in mind that the nature of the IV, together with DV, affects your choice of a statistical test as well.
What other variables might cause the effect that you would like to investigate (to prevent omitted variable bias, i.e.Ā variables that are not part of your model but still influence the dependent variable).
Potential interactions (e.g., is the effect of variable X stronger for group A vs.Ā B?)
In the next step you should review the type of interviewing method you will use. At this point you need to think in which setting you aim to conduct your survery. For instance, should you do it in a face-to-face setting or rather online. Here you can find some advantages and disadvantages of online surveys:
Additionally, here is the list of the online tools you can use to conduct an online survey (usually for free):
In this step you are starting to work on the content of you questions. There are several questions you should ask yourself when writing questions:
In your survey try to avoid asking double-barrelled questions.Those are a single question that attempts to cover two issues. Such questions can be confusing to respondents and result in ambiguous responses. Instead, you might ask multiple questions in order to obtain the inteded information.
Incorrect:
Do you think Nike Town offers better variety and prices than other Nike stores?
Correct:
Do you think Nike Town offers better variety than other Nike stores?
Do you think Nike Town offers better prices than other Nike stores?
The quality of collected data you highly depends on your ability to address correct participants. Therefore, you need to make sure that your respondents are able to meaningfully answer your questions.
Examples:
If you are asking participants to recall certain brands for instance, make sure you use unaided recall question:
Example of unaided recall question:
What brands of soft drinks do you remember being advertised on TV last night?
Example of aided recall question:
Which of these brands were advertised last night on TV?
a) Coca-Cola
b) Pepsi
c) Red Bull
d) Evian
e) Donāt know
If you are asking participants to list something, the good case practice is to minimize the effort required by respondents:
Incorrect:
Please list all the departments from which you purchased merchandise on your most recent shopping trip to department store X.
Correct:
Please check all the departments from which you purchased merchandise on your most recent shopping trip to a department store:
a) Womenās dresses
b) Menās apparel
c) Childrenās apparel
d) Cosmetics
e) Jewelry
f) Other (please specify) ___________
In a case you are asking for information that could be considered sensitive (e.g.Ā money, family life, political beliefs, religion), they should come at the end of the questionnaire. Moreover, it is recommendable to provide response categories rather than asking for specific figures:
Incorrect:
What is your householdās exact annual income?
Correct:
Which one of the following categories best describes your householdās annual gross income?
a) under 25.001 ā¬
b) 25.001⬠to 50.000 ā¬
c) 50.001⬠to 75.000 ā¬
d) 75.001⬠to 100.000 ā¬
e) over 100.000 ā¬
Every statistical analysis requires that variables have a specific levels of measurement. Measurement scales you choose for your questions in a survey will affect the answers you get and eventually statistical test you can apply. For instance, it would not make sense to compute an average of genders. An average of a categorical variable does not make much sense. Moreover, if you tried to compute the average of genders defined in numeric values (e.g.Ā male=0, female=1), the output would be interpretable.
Therefore, it is crucial to become familiar with possibilities of each scale before you choose to add another question to your survey. Consequently, chances to obtain data you did not intend to collect and chances that you will not be able to apply tests you intended are significantly lower.
In the following table you can get a quick overview of possibilities per each measurement scale. :
In the table below you can find general procedure for choosing a correct analysis based on the measurement scale of your data and number of variables. It shows statistical analyses we covered during the course and aims to help you choose among them based on the nature of dependent variables on the side, and the nature and the number of your independent variables on the other side:
When it comes to scaling techniques, they are meant to study the relationship between objects. The basic scaling techniques classification is on comparative and non-comparative scales.
The noncomparative scale each object is scaled independently of the other objects. The resulting data is supposed to be measured in an interval and ratio scaled.
Comparative scales (or nonmetric scaling) compare direclty the stimulus object. For example, the respondent might be asked directly about his preference between domestic and foreign beer brands. As a result, the comparative data collected can only be interpreted in relative terms. In the following sections we will walk through both types of comparative scales and briefly introduce them.
In the table below you can find a couple of commonly measured constructs in marketing research such as attitude, importance, purchase intention and similar.
Typically, participants rate objects on a number of itemized, seven-point rating scales bounded at each end by one of two bipolar adjectives.
Semantic differential can measure respondent attitudes towards something (products,concepts, items, peopleā¦).
It helps you find the repondentās position is on a scale between two bipolar adjectives such as āSweet-Sourā or āBright-Darkā. In comparison to Likert scale, which uses generic scales (e.g.Ā extremely dissatisfied to extremely satisfied), semantic differential questions are posed within the context of evaluating attitudes.
Widely used rating scale in marketing research due to its versatility
When creating a semantical difference question, you should consider the following:
The sequnece of questions in a questionnaire could play imporant role. For instance, more sensitive questions (such as demographic-related questions) are usually placed at the end as they can trigger change in respondentās behavior.
If you plan to conduct an online survey, then you need to think about the respondentās experience while doing your questionnaire. For instance, spread the content over more short pages and do not have fewer long pages. In online surveys, two questions on one page is a useful rule of thumb. Generally, respondents are reluctant to read and fill out long questionnaire pages. Hence, long pages will lead to a higher dropout rate. In order to reduce dropout rate state how long the survey will approximately take in the introduction of the questionnaire. Take into account that tools like Qualtrics provide the estimated response time in the survey overview.
Consider that the most of people usually use their phones to fill it out. Think about how the questionnaire will appear on a phone screen too. In that regard, think of length of questions especially.
In the end, the questionnaire structure has to be aligned with the research design. For example, if your research design features an experiment, this needs to be reflected in the questionnaire (e.g., you need to assign the respondents randomly to the experimental conditions in case of a between-subjects comparison).
In a between-subject design you randomly assign each respondent to different experimental conditions. They would then complete tasks only in the condition to which they are assigned.
For instance, we would like to test the effect of two advertisements on purchase intention. Therefore, one group of (randomly assigned) respondents will be exposed to one advertisement version while the other group (of randomly assigned respondents) will be exposed to another version. After that, both groups of respondents should express their willingness to buy the advertised product. Evenutally, if the dependent variable (e.g.Ā willingness to buy) is measured on interval or ratio scale, then you can use independent t-test to compare group means. The whole experimental design should be organised as following:
This type of experimental design involves exposing each respondent to all of the user experimental conditions youāre testing. This way, each respondent will test all of the conditions.
For instance, we would like to test again the effect of two advertisements on purchase intentions, but this time in a within-subject design. First, each respondent will be exposed to the first version of advertisement and right after that asked to rate his/her willingness to buy the advertised product. Subsequently, each participant will be shown another version of advertisement and again rate his/her willingness to purchase the advertised product. Finally, we can compare group means with paired sample t-test (given that data is measured on interval or ratio scale).
Generally, question wording should enable each respondent to understand questions and to be able to answer them with reliability. Reliability means that, if a respondent was asked the same question again, he/she would give the same answer again. A number of common problems regarding the question wording have been identified, so we will address the most important ones.
In order to ensure reliability, the issue in terms of who, what, when and where should be defined in each question.
Example: Which brand of shampoo do you use?
Who (the respondent): It is not clear whether this question relates to the individual respondent or the respondentās total household.
What (the brand of shampoo): It is unclear how the respondent is to answer this question if more than one brand is used.
When (unclear): The time frame is not specified in this question. The respondent could interpret it as meaning the shampoo used this morning, this week, or over the past year.
Where (not specified): At home, at the gym? Where?
A more clearly defined question is:
Which brand or brands of shampoo have you personally used at home during the last month? In the case of more than one brand, please list all the brands that apply.
Use ordinary words. Words should match the vocabulary level of the participants.
Incorrect:
āDo you think the distribution of soft drinks is adequate?ā
Correct:
āDo you think soft drinks are easily available when you want to buy them?ā
Avoid double negative form. Double negative question forms can confuse respondents, especially when they need to answer with āAgreeā or āDisagreeā.
Incorrect:
Do you think that it is not uncommon that boys play basketball?
Correct:
In your opinion, is it common that boys play basketball?
Avoid leading questions.Leading questions clue the participant to what the answer should be. Such questions introduce a bias in a particular direction.
Incorrect:
āIs Colgate your favorite toothpaste?ā
Correct:
āWhat is your favorite brand of toothpaste?ā
Avoid ambiguous words. Words such as usually, normally, frequently, often, regularly, and other similar words, do not define frequency clearly enough.
Incorrect:
āIn a typically month, how often do you go to a movie theater to see a movie?ā
a) Never
b) Occasionally
c) Sometimes
d) Often
e) Regularly
Correct:
āIn a typically month, how often do you go to a movie theater to see a movie?ā
a) Less than once
b) 1 or 2 times
c) 3 or 4 times
d) More than 4 times
One of the last steps in a process of designing a questionnaire is choosing adequate order of questions and instructions for respondents.
At the begining, you should provide a short and easy-to-understand introduction to the topic. Use simple language and avoid technical terms (e.g., not many people will know the terms āmanufacturer brandā and āstore brandā). Additionally, in the introduction you should state how long the survey will approximately take.
The opening questions should be interesting, simple and non-threatening. They are crucial because it is the respondentās first exposure to the questionnaire and is likely to set the tone for the rest of questions in the questionnaire. If too difficult to understand, or sensitive in some way, respondents are likely to stop answering your questions. Qualifying questions (or screening questions) should serve as the opening questions (if applicable). Their purpose is to identify a potential respondent that is eligable to proceed with the research survey.
After the opening part, you should establish an optimal question flow. General questions should precede the specific questions. Questions on one subject, or one particular aspect of a subject, should be grouped together. It may feel confusing to be asked to return to some subject they thought they already gave their opinions about.
As respondents are moving towards the end of the questionnaire, they are likely to become increasingly indifferent and might give careless answers. Therefore, questions of special importance should ideally be included in the earlier part of the questionnaire.
Finally, you should pay particular attention to provide all prescribed definitions and explanations before you ask a question. This ensures that the questions are undestood in consistent way by every respondent.
Finally, before you distribute the final questionnaire, there are some things to consider. First, you should always pretest your questionnaire before sharing it! Test all aspects of the questionnaire (content, wording, sequence, form & layout, etc.). If possible, use respondents in the pretest that are similar to those who will be included in the actual survey. Ideally, the pretest sample size should be small (in a real scenario this could varyfrom 15 to 30 respondents; for the group project, a lower number will be sufficient). After each significant revision of the questionnaire, conduct another pretest, using a different sample of respondents. Eventually, code and analyze the responses obtained from the pretest so that you make sure that you collected information you intended to collect.
After testing your questionnaire you should be able to determine whether:
A questionnaire creation in Qualtrics starts with creation of a Qulatrics project. Each project consists of a survey, distribution record, and collection of responses and reports. There are three ways to create a questionnaire.First, you can create a new survey project from scratch. Second, you can create a new questionnaire from a copy of an existing questionnaire. Eventually, you can create from a template in your Survey Library, or from an exported QSF file.
In order to create a completely new questionnaire, you need to do the following:
Go to the Projects page by clicking the Qualtric XM logo or clicking Projects on the top-right.
Create new project by clicking the blue button on the right side.
In the āCreate your ownā section click on the survey button.
Enter a name for your survey and get started with a survey creation.
If you would like to create a new questionnaire on a basis of an already existing one, then you choose āFrom a Copyā. Subseqeuntly, you need to indicate the questionnaire you would like to copy. Now you are good to go!
If there is a questionnaire in the Qualtrics Library you would like to use, then you need to choose āFrom Libraryā, and indicate one library name in the dropdown menu.
In this chapter we will encounter the nature of data you collect when conducting a survey. It will help you choose a type of a question depending on the nature of data you want to collect and on the type of statistical tests you want to apply.
Multiple Choice with a single answer is a type of closed-ended question that lets respondents select one answer from a defined list of choices.
Type of data you obtain is categorical, and the output comes in the following form:
| Q5_1 | Q5_2 | Q5_3 | Q5_4 | Q5_4_TEXT |
|---|---|---|---|---|
| NA | NA | 1 | NA | |
| 1 | NA | NA | NA | |
| 1 | NA | NA | NA | |
| NA | NA | 1 | NA | |
| NA | NA | NA | 1 | Vending machines with coffee |
| 1 | NA | NA | NA |
What to do with this data now? First, we need to load it in R and prepare for analysis. The numbers you see in the output R recognizes as integers. In order to conduct statistical modelling and properly visualize our results, we need to convert our data to a factor class.
A factor (or coding variable) represents different groups of data by using numbers (integers). In fact, factors appear as numeric variables, but they hold meaning of labels/names of data groups, i.e.Ā nominal variable. These data groups are represented in a form of ālevelsā.
In our case, our multiple choice question output will contain 4 data groups (āGrocery Storeā, āOnline shopā, āSpecialised coffee shopā, āotherā) after converting it to factor:
# Table
table(qualtrics_long$value)
Grocery shop Online shop Specialized coffee shop
6 6 2
Other
3
Second, you might want to visualize your results. In order to do so, the data format needs to be in the appropriate format.Here we proceed with data fromat adaptation from the point where we stopped:
# Converting long format to the visualisation-friendly format
mlc_visualisation <- as.data.frame(table(qualtrics_long$value))
# Naming columns
names(mlc_visualisation) <- c('Place','Count')
# Observing
mlc_visualisation
NA
The simpliest way to visualize data obtained from multiple choice question with a single answer is a bar chart:
## ggplot basic bar chart
labels <- as.character(mlc_visualisation$Place)
barplot(mlc_visualisation$Count, # Column to visualize
xlab='Places to buy coffee', # X-axis label
ylab = 'Count(answers)', # Y-axis label
names.arg = labels,
main = 'Where do you buy your coffee?') # Title
R package ggplot2 allows you to create visually appealing graphs:
## ggplot2 bar chart
library(ggplot2)
p <- ggplot(data=mlc_visualisation,
aes(x=Place, y=Count, fill=Place)) +
geom_bar(stat='identity') + theme_minimal() # ggplot2 basic barplot
p
Another R library which can help you make amazing interactive charts in a minute is plotly. Here we use a function called ggplotly(), which allows you to turn any ggplot2 chart interactive. Since we have already created a bar chart using ggplot2 and saved it as āpā, we will just turn it into plotly graph:
library(plotly)
ggplotly(p)
An improved version of ggplot2 package is the packaged called ggvis, which is still in developing:
library(ggvis)
ggvis(mlc_visualisation,
x = ~Place,
y = ~Count,
fill=~Place)